Learners sometimes think that as long as they sort a spreadsheet of data by a column containing
any value and then select a sample of rows from the top, that they have automatically obtained an
SRS. This is not correct! If you think about it more carefully, you will realize why. If you sort
names alphabetically, you will see patterns in names (such as religious names, or names
associated with certain languages, countries, or ethnicities). If you sort by another identifying
column, such as email address or city of residence, you will again see patterns in the data. If you
attempt to take an SRS from such data, it will be biased, not random, and not be representative.
That is why it is important to use a column with an RNG in it for sorting if you are taking an SRS
electronically.
Taking an SRS intuitively seems like the optimal way to draw a representative sample.
However, there are caveats. In the previous example, you started with a clinical population in the
form of a printed or electronic list of patients from which you could draw a sample. But what if
you want to sample from patients presenting to the emergency department during a particular
period of time in the future? Such a list does not exist. In a situation like that, you could use
systematic sampling, which is explained later in the section “Engaging in systematic sampling.”
Another caveat of SRS is that it can miss important subgroups. Imagine that in your list of clinic
patients, only 10 percent were pediatric patients (defined as patients under the age of 18 years).
Because 10 percent of 20 is two, you may expect that a random sample of 20 patients from a
population where 10 percent are pediatric would include two pediatric patients. But in practice, in a
situation like this, it would not be unusual for an SRS of 20 patients to include zero pediatric patients.
If your SRS needs to ensure representation by certain subgroups, then you should consider using
stratified sampling instead.
Taking a stratified sample
In the previous section, we discussed a scenario where 10 percent of the patients of a clinic are
pediatric patients, and taking a sample of 20 using an SRS from a list of the clinic population runs the
risk of not including any pediatric patients. If pediatric patients were important to the study, then this
problem can be solved with stratified sampling. The word stratum refers to a layer (as you see in a
layer cake), and the word strata is the plural of stratum. Stratified sampling can be seen as sampling
from strata, or layers.
In our scenario, if you choose to draw a stratified sample by age groups, you would first have to
separate the list into a pediatric list and a list of everyone else. Then, you could take an SRS from
each. Because you are concerned about each stratum, you could make a rule that even though pediatric
patients make up only 10 percent of the background population, you want them to make up 50 percent
of your sample. If you did that, then when you took your SRS, you would oversample from the
pediatric list and select 10, while also taking an SRS of 10 from the list of everyone else.